Multi-Resolution Language Grounding with Weak Supervision

نویسندگان

  • Rik Koncel-Kedziorski
  • Hannaneh Hajishirzi
  • Ali Farhadi
چکیده

Language is given meaning through its correspondence with a world representation. This correspondence can be at multiple levels of granularity or resolutions. In this paper, we introduce an approach to multi-resolution language grounding in the extremely challenging domain of professional soccer commentaries. We define and optimize a factored objective function that allows us to leverage discourse structure and the compositional nature of both language and game events. We show that finer resolution grounding helps coarser resolution grounding, and vice versa. Our method results in an F1 improvement of more than 48% versus the previous state of the art for fine-resolution grounding1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...

متن کامل

Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"

Grounded language learning bridges words like ‘red’ and ‘square’ with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build perceptual models that use haptic, auditory, and proprioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing obje...

متن کامل

Generation and grounding of natural language descriptions for visual data

Generating natural language descriptions for visual data links computer vision and computational linguistics. Being able to generate a concise and human-readable description of a video is a step towards visual understanding. At the same time, grounding natural language in visual data provides disambiguation for the linguistic concepts, necessary for many applications. This thesis focuses on bot...

متن کامل

Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance lear...

متن کامل

Analysis of Audio-Visual Features for Unsupervised Speech Recognition

Research on “zero resource” speech processing focuses on learning linguistic information from unannotated, or raw, speech data, in order to bypass the expensive annotations required by current speech recognition systems. While most recent zero-resource work has made use of only speech recordings, here, we investigate the use of visual information as a source of weak supervision, to see whether ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014